Capacity Analysis Using Closed-System Modules

ABSTRACT

Disclosed are various embodiments of a capacity analysis tool using system modules. In one example, a system generates a target model for a targeted computing system that includes a first virtual machine and a second virtual machine. The target model can include system models for the first virtual machine and the second virtual machine, and each of the system models can represent one of multiple parameters of the first virtual machine and the second virtual machine. A function of time for each of the plurality of parameters can be generated based at least in part a time series of datapoints for the parameters. An estimated point in time of contention between a first parameter for the first virtual machine and a second parameter for the second virtual machine can be identified. A usable capacity for the first parameter and the second parameter can be determined.

BACKGROUND

“Capacity planning” involves scheduling the acquisition and managementof resources to meet estimated future demands on a target system. Forexample, a data-center operator may need to estimate the space, computerhardware, software, network and other resources that will be needed oversome future period of time. A typical capacity concern of manyenterprises is whether resources will be in place to handle increaseddemand, e.g., as the number of users or interactions increase. Capacitymay be added in time to meet the anticipated demand but not so earlythat resources go unused for a long period.

Although capacity planning finds particular applicability to datacenters, that is, computer systems including large numbers of physicalcomputers running a variety of workloads, capacity planning is generallyapplicable to a wide range of endeavors including, but not limited to,airline operations, traffic management, and facilities acquisition.

“Capacity analysis” involves characterizing a capacity-planning target(CPT) system as a basis for making capacity-planning decisions. Acapacity-analysis tool (CAT) identifies to a user, typically a systemadministrator for the CPT system, the information needed for capacityanalysis. The user provides the information, e.g., processing capacityof a server, or ensures that the requested information, e.g.,time-varying usage data, is provided to the CAT. Based on thisinformation, the CAT can estimate, for example, the efficiency withwhich resources are used or the degree to which resources are wasted, anamount of unused resource capacity, and how much time remains before(e.g., rising) demand reaches capacity. These estimates are thenavailable to guide planning the management and inventory of resourcesfor a CPT system.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram of a capacity analysis system.

FIG. 2 is a schematic diagram of a virtual-machine host and a modelthereof, both of the capacity analysis system of FIG. 1.

FIG. 3 is a graph representing a closed-system template of acapacity-analysis tool (CAT) of the capacity-analysis system of FIG. 1.

FIG. 4 is a nested hierarchical diagram of a CPT system of FIG. 1.

FIG. 5 is a schematic diagram of a CPT model of the CPT system of FIG.4.

FIG. 6 is a flow chart of a capacity-analysis process implementable inthe capacity-analysis system of FIG. 1 and in other systems.

FIG. 7 is a schematic view of a programmed hardware computer systemincluding elements of the capacity-analysis system of FIG. 1.

DETAILED DESCRIPTION

In the course of the present invention, it was recognized that CAT toolscan sometimes neglect factors relevant to a capacity analysis. Even if aCAT addresses an adequate set of factors when first used, it might failto address a factor that becomes impactful over time. For example,suppose a CAT allows a user to create a CPT model that adequatelycharacterizes a CPT system, but that, later, the electric power supplierfor the CPT system is forced to ration electric power. If the CAT doesnot provide for modifying the model to treat electric power as a factorbearing on capacity analyses, the model's usefulness for capacityanalysis will be impaired.

For another example, suppose a data-center operator enters aservice-level agreement that limits disk-access latencies to 10milliseconds. If a CPT model does not work with latency as a dimension,then the CPT model's estimates may not provide a useful guide forcapacity planning. Even if the CAT calls for tracking a number of diskinput-output operations per second, from which disk-access latency maybe estimated, the information may be in an inconvenient form from theuser's perspective. It would be preferable for the CAT to deal withdisk-access latency directly, rather than indirectly through a parameterof little direct interest to the user.

The present invention provides a CAT that allows the user to create newmodules for a modular CPT model. The modules, in this case, are modelsof closed systems (i.e., systems having capacity limits). The CAT allowsa user to create a closed-system module (CSM) by specifying a dimension(e.g., processing amount, memory amount, disk-storage amount, or diskinput-output bandwidth), along with associated parameters (propertiesand metrics), e.g., for capacity, demand, and/or usage associated withthe respective dimension). The user has the flexibility in the selectionof dimensions and parameters to match the CPT system and contextualfactors such as service-level agreements.

A model for a CPT system includes models of its resource-containercomponents. Thus, a model of a data center may include a model of aphysical server. The model of the physical server can include severalCSMs, e.g., for processing in cycles-per-second, memory in gigabytes,disk-storage in terabytes, and disk-storage access in input-outputoperations per second. If electricity is rationed, the CAT permits theuser to define a CSM for power consumption and add it to the servermodel. If the user prefers access latency in milliseconds toinput-output operations per second, the CAT allows the user to create aCSM with access latency as the dimension; the user can then substitutethe access-latency CSM for the input-output operations CSM.

Furthermore, the CAT permits a user to select the parameters to monitorfor each CSM. In most cases, capacity and demand will be of interest foreach CSM. However, there may be more than one relevant capacityparameter, e.g., total physical capacity, useable capacity, and capacitylimits imposed as a matter of management policy. The CAT permits any oneor more of these parameters to be used. Many systems do not providedemand data per se, so the CAT system permits a usage parameter to beselected along with other parameters that allow demand to be determinedbased on usage.

The flexibility to arbitrarily design and select CSMs for a CPT modelhas numerous advantages. The user, typically, an administrator or otherexpert for the CPT system, can design and create a CPT model thatmatches the CPT system as well as the user's preferences in terms ofdimensions, parameters and units. If circumstances change, the existingmodel can be extended, e.g., by adding or replacing CSMs withoutmodifying the CAT itself.

A modular CPT model breaks complex computations into more manageablechucks for faster processing and to solve otherwise intractableproblems. Computations that might be infeasible in multiple dimensionscan become manageable when dealt with one dimension at a time. Each CSMcan be analyzed independently, e.g., to determine time remaining beforedemand equals capacity for the respective dimension. The results canthen be combined to yield a capacity analysis for the CPT system orcomponent of interest. For example, the shortest time remaining amongthe CSM of a CPT system can be the time remaining before the CPT systemwill fail to meet the demand for at least one resource.

Because of the CAT's versatility, the same tool can be used fordifferent CPT systems. Instead of using different CATs for differentaspects of a data center (e.g., different tools for managing blades,different types of clusters, licensing, etc.), a single general-purposeCAT can provide models for all aspects of a data center. Furthermore,the same tool can be applied to other aspects of an enterprise, e.g.,capacity planning for airlines, mass transit, shipping routes, parkinglots, and so on. For example, dimensions used to characterize a parkinglot can include area, number of spots, incoming vehicle rate, averagevehicle size, and outgoing vehicle size. Different default dimensionsand parameters can be provided for different applications, while theoperation of the CAT itself remains consistent across CPT systems.

A capacity-analysis system 100, shown in FIG. 1, includes acapacity-planning-target (CPT) system 102 and its resource-containercomponents 104. Note that a component of a CPT system can itself beconsidered a CPT system. CPT system 102 can be, for example, a datacenter, with clusters, host servers, and virtual machines as components.However, very different CPT systems are provided for, e.g., an airlinewith routes and planes as components.

Capacity-analysis system 100 is designed to provide estimates 106 to beemployed in capacity planning for CPT system 102, as shown in FIG. 1.For example, remaining capacities, time remaining before capacity isconsumed, and resource usage efficiency can be estimated. Estimates 106can be provided using a CPT model 108, which includes a container model109, algorithms 112, functions of time F(t) 114, and correlation models116. Container model 109 includes CSMs 110 which serve as buildingblocks for CPT model 108.

CSMs 110 are models of aspects of CPT system 102. Each CSM 110 has anassociated dimension (e.g., memory, processing, disk input-outputbandwidth) and an associated capacity parameter. In CPT model 108, eachCSM 110 is associated with a single dimension to provide the greatestsimplification of computations. An alternative embodiment includes a CSMwith more than one dimension. Once estimates have been made for theCSMs, the estimates can be combined to provide a multi-dimensionalcharacterization of CPT system 102 and/or components 104 thereof.

CPT model 108 can take the form of an eXtensible Markup Language (XML)document. Modification of CPT model 108 can then be implemented byediting the XML document using a capacity-analysis tool (CAT) 120. CAT120 provides a model framework 122 that includes a closed-systemtemplate 124. Closed-system template 124 defines a capacity-dimensionrole and associated parameter roles. A CSM can be defined at least inpart by assigning a dimension (e.g., processing amount, memory amount)to the dimension role and parameters (e.g., processing capacity,processing demand, processing usage) to the parameter roles. Modelframework 122 further includes generalized algorithms 126, the argumentsof which are parameter roles. Algorithms 112 of CPT model 108 arecreated by assigning parameters to the parameter roles. Each parameteris to be evaluated repeatedly so that a time-series 130 of datapoints isgenerated for each parameter. The various time series 130 can be storedin a database 132, from which they are accessed by CPT model 108.

A user 134, e.g., a CPT system expert user, assigns dimensions andparameters to their respective roles to create CSMs 110 and CPT model108. CAT 120 can guide user 134 by stepping through CPT components 104(by component or class of components) of CPT system 102. To this end,CAT 120 has access to CPT configuration data 136, which lists componentsof CPT system 102 and characterizes their relationships (e.g., host vs.guest); such configuration data is typically available for managing alarge system. CAT 120 can provide default selections for dimensions andparameters as well as lists of possible dimensions and parameters toguide user 134. However, a user can define new dimensions and newparameters as needed for a particular CPT system.

Once it has been created, CPT model 108 can be trained using amachine-learning engine 140. Machine-learning engine 140 determinesfunctions F(t) 114 of CPT model 108 that fit time series 130 providedfrom database 132. In the process, machine-learning engine 140 findsperiodic patterns and trends. Furthermore, machine-learning engine 140includes a correlation engine 142 to find inter-parameter(metric-to-metric) correlations among parameters used to fill outcorrelation models 116 of CPT model 108. Correlation models 116 caninclude a correlation table for a single object of metrics. Correlationengine 142 permits modeling of any container to find its capacityautomatically and understand the relationship of metrics to each otherautomatically. This in turn, allows a what-if change to impact allmetrics that are collected. For example, access latency may increase asthe rate of disk input-output operations increases. Knowing thecorrelation can permit one unknown to be estimated based on a known orcalculated function or value of another parameter. The correlationfunctions may be linear or non-linear, and uni-variate or multi-variate.

Machine-learning engine 140 can use these correlations to createfunctions for parameters for which there was no time-series counterpart(either because such data is not provided by the CPT system or becausesuch data is not specified in a what-if scenario 144). For example,machine-learning engine 140 can generate demand data and functions oftime based on data associated with usage and other parameters. Oncetraining is complete, the resulting functions 114, correlation models116 can be used to estimate future values, such as the time remainingbefore demand or usage reaches capacity.

FIG. 1 includes a graphic representation of closed-system template 124.A graph of a capacity dimension by time includes a constant-capacityline 150 and an increasing demand line 152 which have respective valuesC and D at a present time T1. These values are used in variousalgorithms 126 for making capacity-planning estimates. Thus, theremaining capacity at time T1 is C-D and the efficiency of resourceusage is D/C. Demand line 152 intersects capacity line 150 at a futuretime T2. Therefore, the remaining time before demand reaches capacity isT2-T1. In general, one would like capacity to remain above demand bysome moderate-sized margin.

“DIMENSION VALUE” in FIG. 1 is a value for a dimension to be assigned toa dimension role; once a dimension is assigned to the dimension role,parameters can be assigned to respective parameter roles. Each assignedparameter is associated with a time series 130 in database 132. Forexample, user 134 may assign processing capacity to the capacitydimension role for one CSM 110 and memory capacity to the capacitydimension role for another CSM 110. User 134 may select the mostconvenient (to the user) units for the dimension, e.g., cycles persecond (CPS), millions of instructions per second (MIPS), or ticks for aprocessing dimension.

“CAPACITY” and “DEMAND” in FIG. 1 represent parameter roles to whichuser 134 can assign parameters for each given CSM. For example,processing capacity can be assigned to the capacity parameter role andprocessing demand can be assigned to the demand parameter role for aclosed-system corresponding to processing capacity of a CPT systemcomponent. Each assigned parameter can be identified in terms of thecorresponding time series 130 in database 132.

For a given one-dimensional CSM, capacity-analysis calculations can befairly straightforward where demand is known. However, for many CPTsystems and components, demand data is not directly available and somust be derived from usage, for which data is generally available. Thus,FIG. 1 shows CPT system 102 providing usage data 146 to database 132.For many CPT systems, usage tracks demand most of the time. However,usage cannot exceed capacity, and thus cannot match demand when demandexceeds capacity. In such cases, usage data occurring before capacitywas reached may be extrapolated to estimate demand while usage islimited to capacity.

The relationship between usage and demand can be complex in hierarchicalCPT systems. For example, as shown in FIG. 2, physical server host 201(of CPT system 102) hosts virtual machines VM1 and VM2. If, for example,each of virtual machines VM1 and VM2 has a demand for 75% of theprocessing capacity of host 201, total demand will exceed capacity eventhough each virtual machine has a demand below capacity.

A CPT model 210 for host 201 can include dozens of CSMs including: a) aprocessing CSM 212 for host 201; b) a processing CSM 214 for virtualmachine VM1; and a processing CSM 216 for virtual machine VM2. (Inaddition, host 201 and virtual machines VM1 and VM2 can be representedby additional CSMs associated with other dimensions, such as memory,disk storage, etc.) As shown for CSM 214, virtual machine VM1 has aconstantly increasing demand for processing resources. As shown for CSM216, virtual machine VM2 has a constant demand for processing resources.As shown, as long as the total demand is less than capacity, usagematches demand. However, once it reaches capacity, usage plateaus andfalls behind demand.

Once the total demand exceeds capacity, the virtual machines are said tobe in “contention” so that the demands of at least one of the virtualmachines will not be met. One possible approach to contention is todivide available capacity evenly between virtual machines. However, suchan approach may not be optimal where one virtual machine is running ahigher priority workload than the other. Accordingly, more sophisticatedapproaches are provided for handling contention situations. Accordingly,closed-system template 124 (FIG. 1) provides for parameter rolesdesigned to handle contention and other issues that can arise incontainment hierarchies.

Closed-system template 124 is shown in FIG. 3 along with representationsof parameter roles to which parameters can be assigned by user 134.“Total capacity” 302 is a parameter role intended to correspond withparameters representing the full or nominal capacity of a system. Forexample, if 128 Gigabytes (Gb) of memory is installed in a host, then128 Gb can be the total capacity for the host. Assuming each virtualmachine running on the host potentially has access to the entire memory,then 128 Gb can be the total capacity for each virtual machine runningon the host even though a virtual machine may have to share thatcapacity with one or more other virtual machines.

Less than all of the total capacity of a resource may be available to acomponent. For example, hypervisor and other system files and processesmay limit processing, memory, and storage capacity available to virtualmachines. In addition, the efficiency with which a resource can be usedmay decrease as usage approaches capacity, e.g., due to packinginefficiencies such as disk fragmentation. Accordingly, a “usablecapacity” parameter role 304 is provided. Whereas, a hardwarespecification is typically used as a total-capacity parameter, usablecapacity can be empirically determined, e.g., by a level at which usagepeaks are truncated. Similarly, an “overhead” parameter role 306provides for assignment of a parameter corresponding to the system filesand processes of a component.

Especially in scenarios in which child components (e.g., virtualmachines) contend for the resources of a parent component (e.g., a hostserver), it may be undesirable to allow one of the child components toconsume all of a particular resource. For example, if one virtualmachine consumes all available capacity, a co-resident virtual machinemay be starved for resources and not get any work done. Accordingly, apolicy-based upper “limit” parameter role 308 provides for setting amaximum amount of a resource that can be allocated to a particular childor other component. Correspondingly, a policy-based lower limit or“reservation” parameter role 310 allows a user to specify a minimumlevel of resources to be guaranteed to a component, e.g., to ensure thecomponent has the resources it needs to function at least at a minimallevel or to meet the terms of a service-level agreement.

Other embodiments may provide for other parameter roles, and may omitone, some, or all of the parameter roles listed in FIG. 3. Moreover, auser may create new parameter roles by leveraging extensibilityfeatures. The parameter roles assigned can vary across CSMs andcomponents. For example, an entitlement parameter may be assigned to aprocessing CSM but not to a memory CSM for the same or a differentcomponent.

A user assigns a parameter to a parameter role by associating arespective time series 130 (FIG. 1) in database 132 to an identity ofthe parameter role. The time series may be provided directly from CPTsystem 102, e.g., for a usage parameter. In other cases, the time seriesmay result from repeating a constant value specified for the hardware,e.g., an amount of installed memory or a speed rating for a processor.Other time series may be generated by a machine-learning engine, e.g.,usable capacity parameter values may be generated based on usage data.Alternatively, a machine-learning engine can generate a function of timeF(t) directly instead of first generating a time series.

The parameter assigned to a parameter role assumes a respective role inCPT model 108 and machine-learning engine 140. For example, parametersassigned to capacity and demand roles will be used in computing theamount of capacity remaining. In some cases, machine-learning engine 140may issue an alert when it appears that the parameter's time series doesnot correspond to the role to which the parameter was assigned.

For systems in which demand data is not provided directly, demand can bedetermined from usage data 146. To this end, knowledge of otherparameters that can affect the relationship between usage and demand canprovide for more accurate estimates. For example, time-series values ofa contention parameter can be used to improve an estimate of demandbased on usage.

There are many other relationships between parameter roles that canpermit values of one parameter to be calculated based on values of otherparameters, e.g., in what-if scenarios. For example, the reservationlevel for one virtual machine can limit the usable capacity for anothervirtual machine. Machine-learning engine 140 makes use of therelationships among parameter roles to refine CPT model 108, while CPTmodel 108 can use these relationships to provide bettercapacity-analysis estimates. For example, for host 201 of FIG. 2,assigning a reservation parameter to virtual machine VM1 can affect theusable capacity available to its sibling component, VM2.

As explained above, CPT model 108 (FIG. 1) is created as user 134assigns dimensions and parameters to their respective roles. Once model108 is created, machine-learning engine 140 operates to train model 108.Training results in the generation of functions of time F(t) 114 basedon time series 130. Even when model 108 is trained, machine-learningengine 140 can continue to operate as model 108 may need to be adjustedas CPT system 102 is reconfigured and new CSMs, dimensions, andparameters are assigned to model 108.

Machine-learning engine 140 fits functions of time F(t) 114 to timeseries 130; determines correlations among functions; and derives somefunctions series from other functions. For example, machine-learningengine 140 can fit a function to usage data so that usage at a futuretime can be estimated. A time function for usable capacity can bedetermined from usage function. A demand function can be derived from ausage function with the help of other time functions for parameters,e.g., contention, that cause usage to deviate from demand in known ways.

Model 108 uses functions of time 114 to extrapolate into the future. Forexample, usage, expressed as a function of time, can be extrapolated toestimate a time at which usage will match usable capacity. The sameextrapolation can indicate when demand will match capacity, once demandas a function of time has been determined.

What-if scenarios 144 can be evaluated using the functions of time. Forexample, one might want to know when usage will match capacity for host201 if virtual machine VM1 is cloned to yield a third virtual machineVM3, as indicated in FIG. 2. In that case, demand will increase atdouble the rate it climbed before the cloning. Also, if a reservation isapplied to VM1, a similar reservation could be applied to the clone. Inthat case, a sum of the reservations would affect the usable capacityfor virtual machine VM2.

In a what-if scenario, it is often useful to determine functions of timefor CSMs of a parent component, e.g., a host, from CSMs of childcomponents, e.g., virtual machines. In general, functions of time forsibling components can be combined to yield functions of time for theircommon parent component. However, the nature of the combination dependson the parameter involved. For example, the demand functions of time forsiblings can be summed to yield a demand function of time for theparent. Usage functions of time can be summed subject to capacitylimitations.

Estimates relating to single CSMs can be made and then combined to yieldmulti-dimensional estimates. For example, consider a server that istreated as having a processing CSM, a memory CSM, and a disk-storageinput-output bandwidth CSM. One can estimate the time remaining beforeusage matches capacity for each dimension: processing, memory, anddisk-storage input-output. Whichever estimate is for the shortest time,is the time remaining for the server.

Almost inevitably and typically due to capacity planning, theconfiguration of CPT system 102 (FIG. 1) will change. For example,capacity may be increased by adding a new host server. User 134 can thenupdate model 108, which may then be retrained by machine-learning engine140. Even without a configuration change, model 108 may have to bemodified and retrained. For example, assume that, in the past, theelectric utility that supplied power for CPT system 102 imposedrationing on CPT system 102. It might make sense, in that case, to addCSMs with power consumption as the assigned dimension to model 108.

Furthermore, model 108 may be modified due to the addition of a newparameter to an existing CSM. It is possible that the importance of anoverlooked parameter might be discovered. For example, the variance inusage might be a dimension that impacts the decision of when to increasecapacity as a high variance may result in many short impairments ofperformance. Again, variance can be added as a parameter to the usageCSM. This change might call for retraining by machine-learning engine140. It is an advantage of capacity analysis tool 120 and its use ofone-dimensional CSMs that new dimensions are easily accommodated as theycome into existence or emerge as important. Furthermore, theaccommodation can be implemented by the user as opposed to by the CATvendor.

Container model 109 (FIG. 1) characterizes the relationships betweencomponents specified in CPT configuration 136. As explained below withreference to FIGS. 4 and 5, this makes it possible to do what-ifmodeling of multiple simulated changes to a system and see the impact ofcapacity and demand changes on other components, e.g., parents,children, and siblings.

CPT system 102, which can be a data center, is represented in FIG. 4 asa nested hierarchy. The data center is arranged in clusters, including acluster 401 and other clusters 402. Cluster 401 includes host 201 andother hosts 202. Host 201 hosts virtual machines VM1 and VM2. Otherhosts 403 respectively host virtual machines. Other clusters 402 caninclude additional hosts, each hosting one or more virtual machines. Thedata center, the clusters, the hosts, and the virtual machines are“resource containers” in that they include or have access to processingresources, memory resources, communications resources, etc.

CPT model 108 for CPT system 102 includes a model for eachresource-container component 104 of CPT system 102. Data center model501, shown in FIG. 5, is a model of CPT system 102, i.e., the datacenter as a whole. Data center model 501 is constituted by CSMs (CSMs)including a processing CSM 502, a memory CSM 504, and other CSMs, e.g.,for disk storage, disk storage input-output bandwidth, networkbandwidth, temperature, power consumption, etc. Cluster model 506 is amodel of cluster 401 (FIG. 4), and is constituted by processing CSM 510,memory CSM 512, and other CSMs. Likewise, cluster models 508 are modelsof other clusters 402 contained by CPT system 102; each of clustermodels 508 can include respective CSMs 514 for multiple dimensions.

Host model 210 is a model for host 201, which is constituted byprocessing CSM 214 and memory CSM 516, as shown in FIG. 5. Host models516 are models of other hosts 403 and include CSMs 518. In additionthere are models for hosts of clusters 402. Virtual-machine model 520 isa model of virtual machine VM1 and is constituted by processing CSM 214,a memory CSM 522, and CSMs for other dimensions. In addition, there arevirtual-machine models 521 with respective sets of CSMs 524. Asexplained above, CPT model 108 includes algorithms 112, and, oncetrained, can include functions 114.

Container model 109 not only lists containers, but specifies containmentrelations between containers. For example, line 530 indicates that CSM214 is a model of a container that is, itself, contained by a containerrepresented by host model 210. In this sense, host model 210 is a parentof virtual machine model 520, while virtual machines 521 are siblings ofvirtual machine model 520.

Host model 210 thus provides a multi-dimensional example. Host 201 hostsvirtual machines VM1 and VM2, as shown in FIG. 2. One may which todetermine the time remaining under a what-if scenario in which one ofthe virtual machines is cloned, and in which the clone (VM3) runs on thesame host. Usage functions can be determined for each dimension of eachvirtual machine. For each dimension, the usage functions of time can becombined to give a host usage function for each dimension; usagefunctions can be combined by summing them subject to capacity andcontention constraints. Alternatively, usage functions can be convertedto demand functions which can be summed without regard to capacity andcontention constraints. Time remaining can then be estimated for eachhost dimension. The shortest time remaining is the time remaining forthe host under the what-if scenario.

Note that the same capacity dimension, i.e., processing or memory, isrepresented at each level of the containment hierarchy. A parameterfunction for a dimension, e.g., processing amount, can be combined(e.g., summed) across siblings to yield a parameter function for thedimension for the parent. Thus, for example, combining processing usagefunctions of time for sibling CSMs 214 and 216 (FIG. 2) yields aprocessing usage function of time for parent (host) CSM 212. Likewise,combining processing usage across host models 210 and 516 can yield aprocessing function of time for parent cluster model 506. Combiningprocessing functions of time for clusters models 506 and 508 can yield aprocessing function of time for data center model 501.

Because the relationships among resource-container components 104 arespecified, the impact of an addition, deletion or modification of onecontainer on its children, siblings, parent, and other ancestors can bedetermined. In fact, the impacts on the system as a whole of evencomplex sets of changes can be determined. This enables what-if modelingof multiple simulated changes to a system so that the impact of capacityand demand changes on parents, children, and siblings can be determined.

A capacity-analysis process 600, flow-charted in FIG. 6, can beimplemented in capacity-analysis system 100 and other systems. At 610, amodel framework is provided that includes at least one closed-systemtemplate with a dimension role and plural parameter roles. At 620, auser applies expertise regarding a capacity-planning-target system tocreate a CPT model for which CSMs are the building blocks. The CPT modelspecifies hierarchical relationships among resource-container models;the CSMs belong to respective resource-container models. Each CSMspecifies a dimension assigned to the dimension role and parametersassigned to at least some of the parameter roles. At 630, the CPT modelis trained using a machine-learning engine so that time series are fitto functions of time and, in some cases, functions of other parameters.

At 640, capacity-analysis estimates are made using the CPT model. Insome scenarios, the estimation procedure can be divided as follows. At641, in a hierarchical system, functions for child components can becombined inter-component and intra-dimension to yield parent functionsfor each dimension. For example, demand functions can be summed acrossvirtual machines sharing the same host to yield a demand function oftime for the host. Likewise, time functions for hosts can be combined todetermine time functions for clusters, and cluster functions of time canbe combined to yield time functions for the entire data center. If thereis no need to determine parent functions based on child functions,action 641 can be omitted.

At 642, single-dimension estimates can be made based on parent or othercomponent functions for each dimension. For example, single-dimensiontime remaining estimates for processing, memory, storage, storageinput-output, etc. can be made for a virtual-machine host. At 643,single-dimension estimates are combined across dimensions to yieldmulti-dimensional estimates. For example, the shortest time remainingamong single-dimension time-remaining estimates is the multi-dimensionaltime remaining estimate for the host.

In response to a configuration change or the addition of a new dimensionor parameter, new CSMs can be added to the CPT model at 650. This caninvolve identifying a dimension not already assigned to any CSM modulein the CPT model. In addition, the adding can include assigning theidentified dimension to a new CSM and adding the new CSM to the CPTmodel. At that point, process 600 returns to 630, (re)training of theCPT model.

Process 600 can be implemented on a computer 700, shown in FIG. 7.Computer 700 includes a processor 702, communications (includinginput-output) devices 704, and media 506. Media 706 is encoded with code708 representing: estimates 106, CPT model 108, CAT 120, database 132,CPT configuration data 136, and machine-learning engine 140. Code 508represents software that is used to program hardware to yield programmedhardware that achieves the described capacity-analysis functionality.

For any component, there can be multiple CSMs corresponding to differentdimensions according to which the component can be described. Theclosed-systems can be limited to independent dimensions. For example,disk access rate and disk access latency are not independent as one canbe determined from the other; accordingly, it would be an unnecessaryprocessing burden to include CSM for both in a CPT model.

For any given CSM and associated dimension, there can be two or moreparameters that can be assigned. The units can vary; for example,processing capacity can be represented in MIPS, CPS, or ticks. Data timeaveraged over a short period will look different than data time-averagedover a long period. In general, a long period time series can be derivedfrom a short period time series.

Herein, a “system” is a set of interacting elements, wherein theelements can be, by way of example and not of limitation, hardware,atoms, and actions. Herein, a “capacity-planning target system” or “CPTsystem” is a system for which capacity planning is or will be applied. A“process” is a system in which the elements are actions. Herein, a“closed system” is a system that is capacity constrained in that usagemay be constrained when usage reaches capacity. Certain elementsdescribed herein are in the form of programmed hardware, that is,software executing on hardware such as a computer.

Herein, a “model” is a tangible, non-transitory, representation of anentity that, in some respects, simulates the entity. Herein, a “CPTmodel” is a model of a CPT system. A “model framework” is an entity towhich information can be added to constitute a model. Herein, a “closedsystem” is a system that has a capacity limit, i.e., iscapacity-constrained. A “CSM” is a model of a closed system, wherein theCSM can be used as a building block for a CPT model. A “correlationmodel” includes functions that permit values of one parameter to beestimated using values of another parameter. A “correlation table” is atable with items, in this case parameters, listed both in columns and inrows. The cells at the tow-column intersections are for storingcorrelation values for the row-column pair.

Herein, a CPT model is used to make capacity-analysis estimates, thatis, estimates that are useful in evaluating parameters that are in turnapplicable to capacity planning. Typically, the parameters to beestimated concern relationships between capacity, on the one hand, anddemand or usage on the other. The parameters to be estimated caninclude: an amount of capacity that a system has; an amount of timebefore capacity runs out; an amount of capacity remaining; an amount ofcapacity to meet current or future demand; and an amount of capacitycurrently being wasted.

A “usage parameter” is a parameter relating to an amount of a resourceused or consumed. A “capacity parameter” is an upper limit on the amountof a resource that can be used or consumed. A “demand parameter” is aparameter relating to an amount of a resource requested or needed tomeet some objective. “At least partial function of time” means with apure function of time (F(t)) or a function of time and at least oneother parameter, e.g., F(t,p). For example, a parameter may be afunction of both time and one or more other parameters, e.g., identifiedin a correlation model.

Herein, “machine learning” includes a computer evaluating a training setof data so as to develop a model that permits other, e.g., future, datato be predicted. In the present context, the machine-learning fits timeseries of datapoints to at least partial functions of time; that isfunctions determined by time alone or by time in conjunction with one ormore other independent variables. Machine learning can recognizepatterns, trends and correlations in data that can be used to predictfuture data based on the patterns, trends, and correlations. Herein, therecognized patterns, trends, and correlations are used to refine a modelcreated based on a model framework.

The foregoing embodiments, as well as further variations thereupon andmodifications thereto are provided for by the present invention, thescope of which is defined by the following claims.

Therefore, the following is claimed:
 1. A system, comprising: acomputing device comprising a processor and a memory; and programinstructions executable in the computing device, wherein theinstructions, when executed, cause the computing device to at least:generate a target model for a targeted computing system that includes afirst virtual machine and a second virtual machine, the target modelcomprises a plurality of system models for the first virtual machine andthe second virtual machine, each of the plurality of system modelsrepresenting one of a plurality of parameters of the first virtualmachine and the second virtual machine, and each parameter has acapacity constraint; generate a function of time for each of theplurality of parameters based at least in part a time series ofdatapoints for each of the plurality of parameters; identify anestimated point in time of contention between a first parameter for thefirst virtual machine and a second parameter for the second virtualmachine based at least in part on the function of time and the capacityconstraint; and determine a usable capacity for the first parameter andthe second parameter based at least in part on the estimated point intime of contention and the function of time.
 2. The system of claim 1,wherein the estimated point in time of contention represents arespective point in time in which computing resource demand for thefirst parameter and the second parameter exceeds computing resourcecapacity for the first parameter and the second parameter.
 3. The systemof claim 1, wherein the usable capacity comprises a first usablecapacity for the first parameter and a second usable capacity for thesecond parameter, and the instructions, when executed, cause thecomputing device to at least: determine the second usable capacity forthe second parameter of the second virtual machine based at least inpart on the first usable capacity and a relationship between the firstvirtual machine and the second virtual machine.
 4. The system of claim1, wherein determining the usable capacity based at least in part on athird parameter for a third virtual machine that has been cloned fromthe first virtual machine.
 5. The system of claim 1, wherein theinstructions, when executed, cause the computing device to at least:detect a configuration change for the first virtual machine or thesecond virtual machine; and determine an updated function of time foreach of the plurality of parameters based at least in part theconfiguration change and the time series of datapoints for each of theplurality of parameters.
 6. The system of claim 1, wherein the firstparameter represents a first memory parameter for the first virtualmachine and the second parameter represents a second memory parameterfor the second virtual machine, and the function of time represents afirst demand function of time and a second demand function of time. 7.The system of claim 6, wherein identifying the estimated point in timeof contention is based at least in part on a summation of the firstdemand function of time for the first memory parameter and the seconddemand function of time for the second memory parameter.
 8. Anon-transitory computer-readable medium embodying program instructionsexecutable in a computing device that, when executed by the computingdevice, cause the computing device to at least: generate a target modelfor a targeted computing system that includes a first virtual machineand a second virtual machine, the target model comprises a plurality ofsystem models for the first virtual machine and the second virtualmachine, each of the plurality of system models representing one of aplurality of parameters of the first virtual machine and the secondvirtual machine, and each parameter has a capacity constraint; generatea function of time for each of the plurality of parameters based atleast in part a time series of datapoints for each of the plurality ofparameters; identifying an estimated point in time of contention betweena first parameter for the first virtual machine and a second parameterfor the second virtual machine based at least in part on the function oftime and the capacity constraint; and determine a usable capacity forthe first parameter and the second parameter based at least in part onthe estimated point in time of content and the function of time.
 9. Thenon-transitory computer-readable medium of claim 8, wherein theestimated point in time of contention represents a respective point intime in which computing resource demand for the first parameter and thesecond parameter exceeds computing resource capacity for the firstparameter and the second parameter.
 10. The non-transitorycomputer-readable medium of claim 8, wherein the usable capacitycomprises a first usable capacity for the first parameter and a secondusable capacity for the second parameter, and the program instructions,when executed by the computing device, cause the computing device to atleast: determine the second usable capacity for the second parameter ofthe second virtual machine based at least in part on the first usablecapacity and a relationship between the first virtual machine and thesecond virtual machine.
 11. The non-transitory computer-readable mediumof claim 8, wherein determining the usable capacity based at least inpart on a third parameter for a third virtual machine that has beencloned from the first virtual machine.
 12. The non-transitorycomputer-readable medium of claim 8, wherein the program instructions,when executed by the computing device, cause the computing device to atleast: detect a configuration change for the first virtual machine orthe second virtual machine; and determine an updated function of timefor each of the plurality of parameters based at least in part theconfiguration change and the time series of datapoints for each of theplurality of parameters.
 13. The non-transitory computer-readable mediumof claim 8, wherein the first parameter represents a first memoryparameter for the first virtual machine and the second parameterrepresents a second memory parameter for the second virtual machine, andthe function of time represents a first demand function of time and asecond demand function of time.
 14. A method, comprising: generating, bya computing device, a target model for a targeted computing system thatincludes a first virtual machine and a second virtual machine, thetarget model comprises a plurality of system models for the firstvirtual machine and the second virtual machine, each of the plurality ofsystem models representing one of a plurality of parameters of the firstvirtual machine and the second virtual machine, and each parameter has acapacity constraint; generating, by the computing device, a function oftime for each of the plurality of parameters based at least in part atime series of datapoints for each of the plurality of parameters;identifying, by the computing device, an estimated point in time ofcontention between a first parameter for the first virtual machine and asecond parameter for the second virtual machine based at least in parton the function of time and the capacity constraint; and determining, bythe computing device, a usable capacity for the first parameter and thesecond parameter based at least in part on the estimated point in timeof content and the function of time.
 15. The method of claim 14, whereinthe estimated point in time of contention represents a respective pointin time in which computing resource demand for the first parameter andthe second parameter exceeds computing resource capacity for the firstparameter and the second parameter.
 16. The method of claim 14, whereinthe usable capacity comprises a first usable capacity for the firstparameter and a second usable capacity for the second parameter, and thefurther comprising: determining, by the computing device, the secondusable capacity for the second parameter of the second virtual machinebased at least in part on the first usable capacity and a relationshipbetween the first virtual machine and the second virtual machine. 17.The method of claim 14, wherein determining the usable capacity based atleast in part on a third parameter for a third virtual machine that hasbeen cloned from the first virtual machine.
 18. The method of claim 14,further comprising: detecting, by the computing device, a configurationchange for the first virtual machine or the second virtual machine; anddetermining, by the computing device, an updated function of time foreach of the plurality of parameters based at least in part theconfiguration change and the time series of datapoints for each of theplurality of parameters.
 19. The method of claim 14, wherein the firstparameter represents a first memory parameter for the first virtualmachine and the second parameter represents a second memory parameterfor the second virtual machine, and the function of time represents afirst demand function of time and a second demand function of time. 20.The method of claim 19, wherein identifying the estimated point in timeof contention is based at least in part on a summation of the firstdemand function of time for the first memory parameter and the seconddemand function of time for the second memory parameter.